Packages
This lecture relies on a small, focused toolkit organized by role.
For data wrangling and visualization, we use the tidyverse
to cover the essentials, while timetk streamlines
time-aware feature engineering, visualization, and preprocessing.
We organize modeling work using tidymodels, which brings
together workflows, parsnip, tune,
dials, recipes, rsample, and
yardstick for a consistent interface to model
specification, tuning, resampling, preprocessing, and evaluation. For
time series specifically, modeltime extends
tidymodels with forecasting workflows, and its companions
add backtesting (modeltime.resample), ensembling
(modeltime.ensemble), and AutoML integration
(modeltime.h2o).
For deep learning we bridge to Python with reticulate
and access Amazon’s GluonTS algorithms through
modeltime.gluonts, enabling state-of-the-art probabilistic
forecasting within the same tidymodels workflow.
Time series fundation models are discusse and Nixtla’s TimeGPT is
tested via the timegptr package. Finally, time series
agents are explored using TimeCopilot.
All the required packages can be installed and loaded using the following code:
source("src/R/utils.R")
pkgs <- c(
"devtools", "remotes",
"tidyverse",
"timetk",
"forecast", "prophet", "smooth", "thief",
"glmnet", "earth", "kernlab", "kknn",
"randomForest", "ranger", "xgboost", "bonsai", "lightgbm",
"Cubist", "rules",
"tidymodels", "modeltime", "modeltime.ensemble",
"parallel", "doFuture", "tictoc",
"reticulate"
)
install_and_load(pkgs)
# Install CatBoost from source for Linux
devtools::install_url(
"https://github.com/catboost/catboost/releases/download/v1.0.0/catboost-R-Linux-1.0.0.tgz",
INSTALL_opts = c("--no-multiarch", "--no-test-load")
)
# Install packages from GitHub
remotes::install_github("business-science/modeltime.gluonts")
remotes::install_github("business-science/modeltime.h2o")
Datasets
Email Subscribers
A company decided to change the selling process of its products
converting from a completely physical store approach, to a more digital
and modern solution. Hence, it decided to open an online web store that
integrates an e-commerce platform, where its “virtual” customers can by
all the merchandise.
In order to monitor this new business solution, it adopted few
well-known data analytics tools.
Google Analytics has been set up on the web store pages to collect data related to page views, sessions and organic searches. This could potentially help the company to understand whether its website is gaining popularity.
Moreover, MailChimp is used to track all the customers that buy a product and subscribe to the web store.
Finally, marketing events like discount programs and new product launch are promoted through several social network channels.
All these data are stored into the company database and can be used to analyze the factors that impacts on the web store sales.
M4 Competition Hourly
The M4 Competition is a well-known time series forecasting competition organized by Spyros Makridakis. The competition provides a large dataset of time series from various domains, including finance, economics, and demographics. The goal of the competition is to develop accurate forecasting models for these time series.
https://www.unic.ac.cy/iff/research/forecasting/m-competitions/m4/
We will use a sample of the M4 Hourly dataset, which consists of hourly time series data. The dataset contains multiple time series, each identified by a unique ID.